Searching for Ephemeral Subsequences in Strings

نویسندگان

  • Alberto Apostolico
  • Mikhail J. Attalah
چکیده

Let T = u, ... Un be a text where every symbol Uj has a time slamp t, and a duration d(ai) a.s.sociatcd with it. The time stamps of the ai's are increasing, so that j > i implies tj > li. A text. symbol OJ is alive. at time tiff tj :S t:S t; + d(u;). A subsequence ai, ... OJ ... of T is alive iff every Gi k is alive at time tim.' that is. ti k + d(ai.) ~ tim for all k E {I, ... I m I}. We consider the problem of determining whether a given pattern P == h ... bm occurs as an alive subsequence ofT. We give an off-line (i.e., the pattern is known in advance) algorithm, running in O(n+m) time. We also introduce and discuss data structures for fast on-line implementation. Index Terms Algorithms, pattern matching, ephemeral subsequence, DAWG, forward failure function, intrusion and misuse detection "and Diparlimento di Eletlronica e Informatica, Universita. di Padova, Via Gradenigo 6/A, 35131 Padova, Haly. [email protected]; partially supported by NSF grant CCR-92-01078, by NATO grant CRG 900293, by the National Research Council of Italy, and by the ESPRIT III Basic Research Programme of the EC under contract No. 9072 (Project GEPPCOM). lThis author gratefully acknowledges support from tILe COAST Project at Purdue and its sponsors, in particular Hewlett Packard. DARPA, the National Security Agency, and the Office of Research and Development

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Family of String Classifiers Based on Local Relatedness

This paper introduces a new family of string classifiers based on local relatedness. We use three types of local relatedness measurements, namely, longest common substrings (LCStr’s), longest common subsequences (LCSeq’s), and window-accumulated longest common subsequences (wLCSeq’s). We show that finding the optimal classier for given two sets of strings (the positive set and the negative set)...

متن کامل

A Greedy Approach for Computing Longest Common Subsequences

This paper presents an algorithm for computing Longest Common Subsequences for two sequences. Given two strings X and Y of length m and n, we present a greedy algorithm, which requires O(n log s) preprocessing time, where s is distinct symbols appearing in string Y and O(m) time to determines Longest Common Subsequences.

متن کامل

Computing the Number of Longest Common Subsequences

This note provides very simple, efficient algorithms for computing the number of distinct longest common subsequences of two input strings and for computing the number of LCS embeddings.

متن کامل

Strings with Maximally Many Distinct Subsequences and Substrings

A natural problem in extremal combinatorics is to maximize the number of distinct subsequences for any length-n string over a finite alphabet Σ; this value grows exponentially, but slower than 2n. We use the probabilistic method to determine the maximizing string, which is a cyclically repeating string. The number of distinct subsequences is exactly enumerated by a generating function, from whi...

متن کامل

Finding Frequent Subsequences in a Set of Texts

Given a set of strings, the Common Subsequence Automaton accepts all common subsequences of these strings. Such an automaton can be deduced from other automata like the Directed Acyclic Subsequence Graph or the Subsequence Automaton. In this paper, we introduce some new issues in text algorithm on the basis of Common Subsequences related problems. Firstly, we make an overview of different exist...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013